Vancouver is one of the most beautiful cities in Canada, known for its diverse variety of tree species that have been planted throughout the city. I have seen some great landscaping and tree-planting projects in the city. One that really fascinates me is the creative planting of trees on the walls of buildings in downtown near Canada Place. However, downtown has limited space for trees due to the many buildings. In the area I live Renfrew-Collingwood I see more trees than downtown, maybe other areas in Vancouver. For sure, the number of trees have been or will be increased and new species will be brought and planted in Vancouver, but I wonder how these trees have been distributed across different areas in Vancouver? How does the median tree height compare across neighborhoods in Vancouver, and what is the status of the trees in these neighborhoods? What are the average tree diameters across neighborhoods and street sides in Vancouver? Is there any correlation between tree diameter and height range across various areas in Vancouver? We can answer these questions through the use of an interactive dashboard.
To begin, I will import the Vancouver Street Tree dataset from the University of British Columbia using the pandas library to prepare it for analysis.
# Importing altair and pandas libraries
import pandas as pd
import altair as alt
# Loading dataset
df = pd.read_csv('https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv',
parse_dates = ['date_planted'])
df.head()
| Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10747 | W 20TH AV | W 20TH AV | PLATANOIDES | Riley Park | 2000-02-23 | 28.5 | EVEN | ACER | N | ... | 15 | Y | 21421 | NORWAY MAPLE | 4 | 0 | NaN | N | 49.252711 | -123.106323 |
| 1 | 12573 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | ... | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
| 2 | 29676 | ROSS ST | ROSS ST | NIGRA | Sunset | NaT | 12.0 | ODD | PINUS | N | ... | 7 | Y | 154675 | AUSTRIAN PINE | 4 | 7800 | NaN | N | 49.213486 | -123.083254 |
| 3 | 8856 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | ... | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
| 4 | 21098 | EAST BOULEVARD | EAST BOULEVARD | HIPPOCASTANUM | Shaughnessy | NaT | 15.5 | ODD | AESCULUS | Y | ... | N | Y | 74364 | COMMON HORSECHESTNUT | 4 | 5200 | NaN | N | 49.238514 | -123.154958 |
5 rows × 21 columns
The Vancouver street tree dataset provides a comprehensive listing of public trees located on boulevards throughout the City of Vancouver. The dataset includes essential attributes that describe the characteristics, locations, and classifications of these trees. Below is a detailed description of the key columns within the dataset:
| Columns | Description |
|---|---|
| Unnamed0 | An automatically generated index column. |
| std_street | The standard name of the street where the tree is located. |
| on_street | The name of the street segment where the tree is planted. |
| species_name | The scientific name of the tree species. |
| neighbourhood_name | The name of the neighborhood where the tree is situated. |
| date_planted | The date when the tree was planted. |
| diameter | The diameter of the tree, usually measured at breast height (DBH), which helps in assessing the tree's size and maturity. |
| street_side_name | The side of the street where the tree is located. |
| genus_name | The genus to which the tree species belongs, providing a higher-level classification. |
| assigned | Indicates whether the tree is associated with a nearby lot (Y=Yes, N=No). |
| civic_number | The civic number of the lot associated with the tree. |
| plant_area | The designated planting area of the tree. |
| curb | Describes whether the tree is planted near a curb. |
| tree_id | A unique identifier for each tree in the dataset. |
| common_name | The common name of the tree species. |
| height_range_id | An identifier representing the height range of the tree, which can give insights into the tree's maturity and growth. |
| on_street_block | The block number on the street where the tree is located. |
| cultivar_name | The name of the cultivar or variety of the tree species, if applicable. |
| root_barrier | Indicates the presence of a root barrier. |
| latitude | The latitude coordinate of the tree's location. |
| longitude | The longitude coordinate of the tree's location. |
The dataset is refreshed daily on weekdays to ensure it reflects the most recent updates. However, some attributes may not be updated as frequently due to prioritization and resource allocation. The coordinates were initially provided by the 2016 Geospatial Data for City of Vancouver Street Trees project. In cases where latitude and longitude values are 0, it indicates that the location data for those trees is not available.
This dataset serves as a valuable resource for understanding the distribution, diversity, and characteristics of street trees in Vancouver, enabling analyses that can inform urban forestry management and planning.
# Numeric Description
df.describe()
| Unnamed: 0 | date_planted | diameter | civic_number | tree_id | height_range_id | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|
| count | 5000.000000 | 2363 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.00000 | 5000.000000 | 5000.000000 | 5000.000000 |
| mean | 14861.920400 | 2003-09-06 04:03:08.912399488 | 12.340888 | 2975.707600 | 128682.584600 | 2.73440 | 2960.227000 | 49.247349 | -123.107128 |
| min | 2.000000 | 1989-10-31 00:00:00 | 0.000000 | 2.000000 | 36.000000 | 0.00000 | 0.000000 | 49.202783 | -123.220560 |
| 25% | 7192.750000 | 1997-11-06 00:00:00 | 4.000000 | 1300.500000 | 61321.500000 | 2.00000 | 1300.000000 | 49.230152 | -123.144178 |
| 50% | 14870.000000 | 2003-02-12 00:00:00 | 10.000000 | 2639.000000 | 130130.500000 | 2.00000 | 2600.000000 | 49.247981 | -123.105861 |
| 75% | 22366.750000 | 2009-11-17 00:00:00 | 18.000000 | 4123.000000 | 191332.000000 | 4.00000 | 4100.000000 | 49.263275 | -123.063484 |
| max | 29992.000000 | 2019-05-07 00:00:00 | 71.000000 | 9113.000000 | 270750.000000 | 9.00000 | 9100.000000 | 49.293930 | -123.023311 |
| std | 8680.023278 | NaN | 9.266600 | 2078.580429 | 75412.260406 | 1.56957 | 2086.861052 | 0.021251 | 0.049137 |
In the numeric data description, the average tree diameter is 12.34 inches, and the median diameter is 18 inches. The maximum diameter recorded is 71 inches, while the minimum is 0 inches. The tallest trees fall within the 90 to 100 feet range, and the shortest trees are 0 feet. The average tree height range is approximately 3.
# Categorical description
df.describe(include = 'object')
| std_street | on_street | species_name | neighbourhood_name | street_side_name | genus_name | assigned | plant_area | curb | common_name | cultivar_name | root_barrier | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 5000 | 5000 | 5000 | 5000 | 5000 | 5000 | 5000 | 4950 | 5000 | 5000 | 2658 | 5000 |
| unique | 603 | 607 | 171 | 22 | 4 | 67 | 2 | 38 | 2 | 361 | 176 | 2 |
| top | W 13TH AV | CAMBIE ST | SERRULATA | Renfrew-Collingwood | ODD | ACER | N | 10 | Y | KWANZAN FLOWERING CHERRY | KWANZAN | N |
| freq | 52 | 49 | 463 | 384 | 2554 | 1218 | 4564 | 736 | 4593 | 383 | 383 | 4679 |
In the above categorical data description, the dataset contains 172 different species and the most frequent one is SERRULATA. The trees are located at 22 neighbourhoods in Vancouver, Renfrew-Collingwood has the highest number of trees and the majority of street trees are located on ODD side. The species have 67 classification (genus) and the top one is ACER. Most of the trees are not assigned to a nearby lot like 4564 trees. There are 38 types of planting area to plant the street tree and 736 trees have been planted 10 feet away from the walkside. Over 4500 trees are planted near curb and have the presence of a root barrier.
# Data information
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 5000 non-null int64 1 std_street 5000 non-null object 2 on_street 5000 non-null object 3 species_name 5000 non-null object 4 neighbourhood_name 5000 non-null object 5 date_planted 2363 non-null datetime64[ns] 6 diameter 5000 non-null float64 7 street_side_name 5000 non-null object 8 genus_name 5000 non-null object 9 assigned 5000 non-null object 10 civic_number 5000 non-null int64 11 plant_area 4950 non-null object 12 curb 5000 non-null object 13 tree_id 5000 non-null int64 14 common_name 5000 non-null object 15 height_range_id 5000 non-null int64 16 on_street_block 5000 non-null int64 17 cultivar_name 2658 non-null object 18 root_barrier 5000 non-null object 19 latitude 5000 non-null float64 20 longitude 5000 non-null float64 dtypes: datetime64[ns](1), float64(3), int64(5), object(12) memory usage: 820.4+ KB
The dataset contains 5000 samples and 21 columns. The columns date_planted and cultivar_name have many null values, while plant_area has 50 missing values. Although date_planted is one of the most useful columns in the dataset, its numerous missing values limit its utility for visualization purposes.
species_name: This column will help us understand the diversity of different tree species in Vancouver.neighbourhood_name: This column allows us to analyze tree distribution across different neighborhoods.genus_name: helps us understand the variety of tree types present in Vancouver's urban forest.diameter: Tree diameter is an important metric for assessing tree size and age across different neighbourhoods.height_range_id: indicates the height range category of trees, allowing us to analyze and compare tree sizes across various areas in Vancouver.assigned: This column indicates whether a tree is linked to a nearby lot (Y=Yes) or not (N=No), helping us understand the distribution of trees with and without lot assignments.street_side_name: helps us explore where the tress are located on street (ODD, EVEN, MED and BIKE MED).root_barrier: provides us whether the root barrier is installed or not for the trees across Vancouver areas.That would be more readable to replace N with No and Y with Yes in two columns like assigned and root_barrier.
df['assigned'] = df['assigned'].replace({'N':'No', 'Y':'Yes'})
df['root_barrier'] = df['root_barrier'].replace({'N':'No', 'Y':'Yes'})
bar = alt.Chart(df).mark_bar().encode(
x = alt.X('count()', title = 'Number of tree', axis = alt.Axis(grid = False)),
y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', sort = '-x')
).properties(title = 'Fig 1. Tree Distribution Across Vancouver Neighborhoods')
# Put the total count on each bar
text = bar.mark_text(align = 'left', dx = 2, fontWeight = 700).encode(text = 'count()')
# Combining bar chart and text
chart1 = (bar + text)
chart1
In the chart 1, Renfrew-Collingwood stands out with the highest number of trees, while Kensington-Cedar Cottage, Hastings-Sunrise, and Dunbar-Southlands also have a significant number of trees. Conversely, Strathcona has the fewest trees among Vancouver’s neighborhoods. Now that we know the number of trees across different areas in Vancouver, it would be more interesting to explore the median tree heights across the neighborhoods and their assignment status. Let's find it out.
neighbourhood = df['neighbourhood_name'].unique().tolist()
chart2 = alt.Chart(df).mark_point().encode(
x = alt.X('median(height_range_id)', title = 'Median Tree height range'),
y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
color = alt.Color('assigned', title = 'Assigned'),
size = alt.Size('assigned'),
tooltip = ['mean(height_range_id)', 'neighbourhood_name', 'assigned']
).properties(title = 'Fig 2. Median Tree Heights of Assigned vs. Unassigned Across Neighborhoods')
chart2
From chart 2, it appears that 18 Vancouver neighborhoods have a similar median tree height range of 2 (20 to 30 feet) for trees, whether they are associated with a nearby lot or not. However, in 6 areas like Strathcona, Shaughnessy, Kitsilano, Kerrisdale, and Kensington-Cedar Cottage the median tree height range for assigned trees is higher than for unassigned trees. In contrast, in Dunbar-Southlands, the median tree height range for assigned trees is lower than for unassigned trees. This is expected, as 4564 trees in the dataset are not associated with a nearby lot. It is also important to know the size of trees and where they are located on the street side across different Vancouver areas. Let's figure out the average tree diameter on different street sides across Vancouver neighborhoods.
street_side = sorted(df['street_side_name'].unique().tolist())
chart3 = alt.Chart(df, width = 150).mark_rect().encode(
x = alt.X('street_side_name', title = 'Street side', scale = alt.Scale(domain = street_side)),
y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
color = alt.Color('mean(diameter)', title = 'Mean diameter (inch)'),
tooltip = ['neighbourhood_name', 'street_side_name', 'mean(diameter)']
).properties(title = 'Fig 3. Average Tree Diameter by Neighborhood and Street Side in Vancouver')
chart3
Chart 3 displays the average tree diameter across various Vancouver neighborhoods and street sides. Notably, all areas in Vancouver except Downtown lack trees planted on BIKE MED, where Downtown's average tree diameter is 3 inches. In Strathcona and West End, there are no trees planted in the median strip (MED). Surprisingly, the Shaughnessy area has the highest average tree diameter of 23 inches in the median strip, followed by Oakridge, Dunbar-Southlands, and Arbutus-Ridge, which also have significant average diameters in the median strip. Other areas have average tree diameters between 5 and 10 inches. Trees planted on the side of the street with odd-numbered and even-numbered addresses generally have average diameters between 10 to 15 inches, except in Downtown, where it is less than 10 inches. So far, we have found the median tree height range and average tree diameter across different Vancouver areas, assigned trees and street side. That brings me a question like what is the relationship between tree diameter and height range across Vancouver neighbourhoods? let's find it out.
# List and sort all categories
cat_class = sorted(df['neighbourhood_name'].unique().tolist())
# Binding selection
select = alt.binding_select(name = 'Neighbourhood Name: ', options = cat_class)
# Selection point with binding
menu = alt.selection_point(fields = ['neighbourhood_name'], bind = select)
# Scatter plot with selection point
chart4 = alt.Chart(df, width = 350, height = 300).mark_circle(size = 45).encode(
x = alt.X('height_range_id', title = 'Tree height range', scale = alt.Scale(domain = [0, 10])),
y = alt.Y('diameter', title = 'Tree diameter (inch)'),
stroke = alt.Stroke('neighbourhood_name', legend = None),
tooltip = ['height_range_id', 'diameter', 'species_name', 'genus_name', 'street_side_name', 'assigned'],
opacity = alt.condition(menu, alt.value(0.95), alt.value(0))
).add_params(menu).properties(title = 'Fig 4. Relationship between tree diameter and height range across neighbourhoods')
chart4
From that scatter plot, it appears there is a positive relationship between tree diameter and tree height range across different neighbourhoods.
title = alt.TitleParams('Fig 5. Distribution of Root Barriers Across Vancouver Neighborhoods', anchor = 'middle', dy = -5)
chart5 = alt.Chart(df).mark_bar().encode(
x = alt.X('count()', title = 'Number of trees'),
y = alt.Y('neighbourhood_name', sort = 'x', title = 'Vancouver neighobourhoods'),
color = alt.Color('root_barrier', title = 'Root barrier'),
column = alt.Column('root_barrier', title = None),
tooltip = ['neighbourhood_name', 'root_barrier', 'count()']
).resolve_scale(y = 'independent').properties(title = title)
chart5
From Figure 5, it is evident that most Vancouver neighborhoods have not installed root barriers for the majority of trees. However, Hastings, Renfrew-Collingwood, and Sunset stand out with the highest number of root barrier installations, totaling 41, 39, and 36, respectively. This observation raises questions about why root barriers are not more widely used in Vancouver. Is there a specific reason behind this, and does it have any impact on tree health?
chart6 = alt.Chart(df).mark_point(size = 70, filled = True).encode(
x = alt.X('mean(diameter)', title = 'Average tree diameter (inches)', axis = alt.Axis(gridColor = 'brown', gridOpacity = 0.1)),
y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
color = alt.Color('root_barrier', title = 'Root barrier', scale = alt.Scale(scheme = 'set1')),
tooltip = ['neighbourhood_name', 'root_barrier', 'mean(diameter)']
).properties(title = 'Fig 6. Average Tree Diameter with Root Barrier Installation Across Neighborhoods')
chart6
Figure 6 shows that root barrier installation negatively impacts tree growth and health. The average tree diameter in Vancouver areas with root barriers ranges from 4 to 8 inches, while in areas without root barriers, it ranges from 10 to 16 inches. This significant difference suggests that root barriers hinder tree development.
In this project, we visualized the Vancouver street tree dataset provided by the University of British Columbia. We focused on several key columns from the dataset to address our project questions.
In figure 1, we can notice that one of the Vancouver area like Strathcona is the only one has less tress compared to other areas and Renfrew-Collingwood has the heighest number of trees, while Kensington-Cedar Cottage, Hastings-Sunrise, and Dunbar-Southlands also have a significant number of trees. Overall, across Vancouver neighbourhoods significante number of trees have been planted and this number will be increased by the time.
In figure 2, According to visualizing the median tree height range across Vancouver areas and tree assignment status, the median of tree heights are between 20 to 30 feet in 18 Vancouver areas whether they are assignet to a nearby lot or not. However, in 6 areas like Strathcona, Shaughnessy, Kitsilano, Kerrisdale, and Kensington-Cedar Cottage the median tree height for assigned trees is higher than for unassigned trees. In contrast, in Dunbar-Southlands, the median tree height for assigned trees is lower than for unassigned trees. We already expected this, as 4564 trees in the dataset are not associated with a nearby lot.
In figure 3, We visualized the average tree diameter across various Vancouver areas and stree side. We found that Downtown's average diameter of these trees planted on bicycle lane (BIKE MED) is 3 inches and in other Vancouver areas there are no trees planted on bicycle lane (BIKE MED). Furthermore, in Strathcona and West End there are no trees planted in the median strip (MED). Surprisingly, the Shaughnessy area has the highest average tree diameter of 23 inches in the median strip, followed by Oakridge, Dunbar-Southlands, and Arbutus-Ridge, which also have significant average diameters in the median strip. Other areas have average tree diameters between 5 and 10 inches. Trees planted on the side of the street with odd-numbered and even-numbered addresses generally have average diameters between 10 to 15 inches, except in Downtown, where it is less than 10 inches.
In figure 4, we visualized the tree diameter and height range across Vancouver neighbourhoods, generally that appears there are a significante-positive relationship between tree diameter and height range across Vancouver areas. By increasing the tree height range increases the tree diameter.
In figure 5 and 6, We have found that the root barrier in Vancouver neighbourhoods have not installed for most of the trees and a few Vancouver areas like Hastings, Renfrew-Collingwood, and Sunset stand out with the highest number of root barrier installations, totaling 41, 39, and 36, respectively. The reason of root barrier not being installed is that it negtaively impacts the tree growth and health and it also hinder the tree development.
There are still more interesting questions left for further visualization, such as what tree species and genus are the most common across Vancouver areas, where the trees are located to street sides across Vancouver neighbourhoods and how the tress are geographically distributed in Vancouver areas.
The scatter plot serves as the selector plot. When an area is selected on the scatter plot, only the data from the selected area will be displayed on the other plots. Additionally, the scatter plot features a markdown menu that allows you to focus on a specific Vancouver area; selecting an area from the menu will update the other plots to show data for that area. The two bar charts have clickable legends, and the heatmap includes radio buttons for further interaction.
# Plot selector
# These codes are as the same as the above scatter plot
cat_class = sorted(df['neighbourhood_name'].unique().tolist())
select = alt.binding_select(name = 'Neighbourhood Name: ', options = cat_class)
menu = alt.selection_point(fields = ['neighbourhood_name'], bind = select)
chart4 = alt.Chart(df, width = 350, height = 300).mark_circle(size = 45).encode(
x = alt.X('height_range_id', title = 'Tree height range'),
y = alt.Y('diameter', title = 'Tree diameter (inch)'),
tooltip = ['height_range_id', 'diameter', 'species_name', 'genus_name', 'street_side_name', 'assigned'],
stroke = alt.Stroke('neighbourhood_name', legend = None),
opacity = alt.condition(menu, alt.value(1), alt.value(0))
).add_params(menu).properties(title = 'Fig 4. Relationship between tree diameter and height range across neighbourhoods')
# The second option for selection
interval = alt.selection_interval()
selector = chart4.encode(color = alt.condition(interval, 'neighbourhood_name', alt.value('white'))).add_params(interval).properties(
title = 'Relationship between tree diameter and height range across neighbourhoods'
)
# Chart 2 is clickable legend
legend_bind = alt.selection_point(fields = ['assigned'], bind = 'legend')
plot1 = chart2.encode(color = alt.condition(legend_bind, 'assigned', alt.value('white'))).add_params(legend_bind)
# Linking plot 1 with selector plot
panel_1 = plot1.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
title = 'Median Tree Heights of Assigned vs. Unassigned Across Neighborhoods'
)
# Chart 3 is drop-down menu
sort_street_side = sorted(df['street_side_name'].unique().tolist())
radio = alt.binding_radio(name = 'Street side name: ', options = sort_street_side)
button = alt.selection_point(fields = ['street_side_name'], bind = radio)
plot2 = chart3.encode(color = alt.condition(button, 'mean(diameter)', alt.value('white'))).add_params(button)
# Linking plot 2 with selector plot
panel_2 = plot2.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
title = 'Average Tree Diameter by Neighborhood and Street Side in Vancouver'
)
# Chart 6 is clickable legend
leg_bind = alt.selection_point(fields = ['root_barrier'], bind = 'legend')
plot3 = chart6.encode(color = alt.condition(leg_bind, 'root_barrier', alt.value('white'))).add_params(leg_bind)
# Linking plot 3 with selector plot
panel_3 = plot3.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
title = 'Average Tree Diameter with Root Barrier Installation Across Neighborhoods'
)
((selector | panel_1).resolve_scale(color = 'independent', size = 'independent', shape = 'independent', stroke = 'independent')
& (panel_2 | panel_3).resolve_scale(color = 'independent', size = 'independent', shape = 'independent', stroke = 'independent'))